Adding documents to an index | Manticore Search Manual

You can connect to Manticore Search over HTTP/HTTPS.

By default Manticore listens for HTTP, HTTPS and binary requests on ports 9308 and 9312.

In section "searchd" of your configuration file the HTTP port can be defined with directive listen like this:

Both lines are valid and equal by meaning (except for the port number), they both define listeners that will serve all api/http/https protocols. There are no special requirements and any HTTP client can be used to connect to Manticore.

‹›

HTTP

HTTP

📋

⚙

searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9312:http
...
}

All HTTP endpoints respond with application/json content type. Most endpoints use JSON payload for requests, however there are some exceptions that use NDJSON or simple URL encoded payload.

There is no user authentication implemented at the moment, so make sure the HTTP interface is not reachable by anyone outside your network. Since Manticore acts like any other web server, you can use a reverse proxy like Nginx to add HTTP authentication or caching.

The HTTP protocol also supports SSL encryption: If you specify :https instead of :http only secured connections will be accepted. Otherwise if no valid key/cert provided, but client tries to connect via https - the connection will be dropped. If you send not HTTPS, but an HTTP request to 9443 it will answer with HTTP code 400.

‹›

HTTPS

HTTPS

📋

searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9443:https
...
}

Separate HTTP interface can be used to perform 'VIP' connections. A connection in this case bypasses a thread pool and always forcibly creates a new dedicated thread. That's useful for managing Manticore Search in case of a severe overload when the server would either stall or not let you connect via a regular port otherwise.

Read more about listen in this section.

‹›

VIP

📋

searchd {
...
   listen = 127.0.0.1:9308
   listen = 127.0.0.1:9318:_vip
...
}

Performing a quick search is as easy as:

‹›

CURL

CURL

📋

curl -sX POST http://localhost:9308/search -d ' {"index":"test","query":{"match":{"title":"keyword"}}}'

Endpoint /sql allows running an SQL SELECT query via HTTP JSON interface.

The query payload must be URL encoded, otherwise query statements with = (filtering or setting options) will result in an error.

The response is in JSON format and contains hits information and time of execution. The response shares the same format as json/search endpoint.

‹›

HTTP

HTTP

📋

⚙

POST /sql --data-urlencode "query=select id,subject,author_id  from forum where match('@subject php manticore') group by author_id order by id desc limit 0,5"

‹›

Response

{
  "took":10,
  "timed_out": false,
  "hits":
  {
    "total": 2,
    "hits":
    [
      {
        "_id": "1",
        "_score": 1,
        "_source": { "gid": 11 }
      },
      {
        "_id": "2",
        "_score": 1,
        "_source": { "gid": 12 }
      }
    ]
  }
}

For comfortable debugging in your browser you can set HTTP parameter mode to raw, and then the rest of the query after 'query=' will be passed inside without any substitutions/url decoding. Here's an example of how it can fail w/o the mode=raw:

‹›

HTTP

HTTP

📋

⚙

POST /sql -d "query=select id,packedfactors() from movies where match('star') option ranker=expr('1')"

‹›

Response

{"error":"query missing"}

Adding mode=raw fixes that:

‹›

HTTP

HTTP

📋

⚙

POST /sql -d "mode=raw&query=select id,packedfactors() from movies where match('star') option ranker=expr('1')"

‹›

Response

{
  "took":0,
  "timed_out":false,
  "hits":{
    "total":72,
    "hits":[
      {
        "_id":"5",
        "_score":1,
        "_source":{
          "packedfactors()":{
            "bm25":612,
            "bm25a":0.69104159,
            "field_mask":32,
            "doc_word_count":1,
            "fields":[
              {
                "field":5,
                "lcs":1,
                "hit_count":1,
                "word_count":1,
                "tf_idf":0.24835411,
                "min_idf":0.24835411,
                "max_idf":0.24835411,
                "sum_idf":0.24835411,
                "min_hit_pos":1,
                "min_best_span_pos":1,
                "exact_hit":0,
                "max_window_hits":1,
                "min_gaps":0,
                "exact_order":1,
                "lccs":1,
                "wlccs":0.24835411,
                "atc":0.000000
              }
            ],
            "words":[
              {
                "tf":1,
                "idf":0.24835411
              }
            ]
          }
        }
      },
...
    ]
  }
}

️Adding documents to a real-time index

Last modified: November 22, 2021

▪️ Adding documents to an index

If you are looking for information about adding documents to a plain index please read section about adding data from external storages.

Adding documents in a real-time manner is only supported for Real-Time and percolate indexes. Corresponding SQL command or HTTP endpoint or a client's functions inserts new rows (documents) into an existing index with provided field values.

You can insert new documents with values for all fields of the index or only part of them. In this case the other fields will be filled with their default values (0 for scalar types, empty string for text types).

Expressions are not currently supported in INSERT and the values should be explicitly specified.

The ID field/value can be omitted as RT index supports auto-id functionality. You can use "0" as the id value to force automatic ID generation. Rows with duplicate IDs will not be overwritten by INSERT. You can use REPLACE for that.

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

General syntax:

INSERT INTO <index name> [(column, ...)]
VALUES (value, ...)
[, (...)]

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products(title) VALUES ('Crossbody Bag with Tassel');
INSERT INTO products VALUES (0,'Yellow bag', 4.95);

POST /insert -d
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel",
    "price" : 19.85
  }
}
POST /insert
{
  "index":"products",
  "id":2,
  "doc":
  {
    "title" : "Crossbody Bag with Tassel"
  }
}
POST /insert -d
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85]
]);
$index->addDocuments([
        ['id' => 2, 'title' => 'Crossbody Bag with Tassel']
]);
$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}})
indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}})
indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}})

res = await indexApi.insert({"index" : "test", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}});
res = await indexApi.insert({"index" : "test", "id" : 2, "doc" : {"title" : "Crossbody Bag with Tassel"}});
res = await indexApi.insert({"index" : "test", "id" : 0, "doc" : {{"title" : "Yellow bag"}});

InsertDocumentRequest newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
    put("price",19.85);
}};
newdoc.index("products").id(1L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Crossbody Bag with Tassel");
}};
newdoc.index("products").id(2L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);
newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);

‹›

Response

Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)
Query OK, 1 rows affected (0.00 sec)

There is an auto ID generation functionality for column ID of documents inserted or replaced into an real-time or a Percolate index. The generator produces a unique ID of a document with some guarantees and should not be considered an auto-incremented ID.

The value of ID generated is guaranteed to be unique under the following conditions:

server_id value of the current server is in range of 0 to 127 and is unique among nodes in the cluster or it uses the default value generated from MAC address as a seed
system time does not change for the Manticore node between server restarts
auto ID is generated fewer than 16 million times per second between search server restarts

The auto ID generator creates 64 bit integer for a document ID and uses the following schema:

0 to 23 bits is a counter that gets incremented on every call to auto ID generator
24 to 55 bits is a unix timestamp of the server start
56 to 63 bits is a server_id

This schema allows to be sure that the generated ID is unique among all nodes at the cluster and that data inserted into different cluster nodes does not create collisions between the nodes.

That is why the first ID from the generator used for auto ID is NOT 1 but a larger number. Also documents stream inserted into an index might have not sequential ID values if inserts into other indexes happen between the calls as the ID generator is single in the server and shared between all its indexes.

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85);
INSERT INTO products VALUES (0,'Yello bag', 4.95);
select * from products;

POST /insert
{
  "index":"products",
  "id":0,
  "doc":
  {
    "title" : "Yellow bag"
  }
}
GET /search
{
  "index":"products",
  "query":{
    "query_string":""
  }
}

$index->addDocuments([
        ['id' => 0, 'title' => 'Yellow bag']
]);

indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}})

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag"}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
 }};
newdoc.index("products").id(0L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);

‹›

Response

+---------------------+-----------+---------------------------+
| id                  | price     | title                     |
+---------------------+-----------+---------------------------+
| 1657860156022587404 | 19.850000 | Crossbody Bag with Tassel |
| 1657860156022587405 |  4.950000 | Yello bag                 |
+---------------------+-----------+---------------------------+

You can insert into a real-time index not just a single document, but as many as you want. It's ok to insert into a real-time index in batches of tens of thousands of documents. What's important to know in this case:

the larger the batch the higher is the latency of each insert operation
the larger the batch the higher indexation speed you can expect
each batch insert operation is considered a single transaction with atomicity guarantee, so you will either have all the new documents in the index at once or in case of a failure none of them will be added
you might want to increase max_packet_size value to allow bigger batches

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

For bulk insert just provide more documents in brackets after VALUES(). The syntax is:

INSERT INTO <index name>[(column1, column2, ...)] VALUES ()[,(value1,[value2, ...])]

Optional column name list lets you explicitly specify values for some of the columns present in the index. All the other columns will be filled with their default values (0 for scalar types, empty string for string types).

For example:

INSERT INTO products(title,price) VALUES ('Crossbody Bag with Tassel', 19.85), ('microfiber sheet set', 19.99), ('Pet Hair Remover Glove', 7.99);

The syntax is in general the same as for inserting a single document, just provide more lines one for each document and use json/bulk endpoint instead of json/insert and enclose each document into node "insert". Note that it also requires:

Content-Type: application/x-ndjson
The data itself should be formatted as a newline-delimited json (NDJSON). Basically it means that each line should contain exactly one json statement and end with a newline \n and maybe \r.

POST /bulk 
-H "Content-Type: application/x-ndjson" -d '
{"insert": {"index":"products", "id":1, "doc":  {"title":"Crossbody Bag with Tassel","price" : 19.85}}}
{"insert":{"index":"products", "id":2, "doc":  {"title":"microfiber sheet set","price" : 19.99}}}
'

Use method addDocuments():

$index->addDocuments([
        ['id' => 1, 'title' => 'Crossbody Bag with Tassel', 'price' => 19.85],
        ['id' => 2, 'title' => 'microfiber sheet set', 'price' => 19.99],
        ['id' => 3, 'title' => 'Pet Hair Remover Glove', 'price' => 7.99]
]);

docs = [ \
    {"insert": {"index" : "products", "id" : 1, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, \
    {"insert": {"index" : "products", "id" : 2, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, \
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
]
res = indexApi.bulk('\n'.join(map(json.dumps,docs)))

let docs = [ 
    {"insert": {"index" : "products", "id" : 3, "doc" : {"title" : "Crossbody Bag with Tassel", "price" : 19.85}}}, 
    {"insert": {"index" : "products", "id" : 4, "doc" : {"title" : "microfiber sheet set", "price" : 19.99}}}, 
    {"insert": {"index" : "products", "id" : 5, "doc" : {"title" : "CPet Hair Remover Glove", "price" : 7.99}}}
];
res =  await indexApi.bulk(docs.map(e=>JSON.stringify(e)).join('\n'));

String body = "{\"insert\": {\"index\" : \"products\", \"id\" : 1, \"doc\" : {\"title\" : \"Crossbody Bag with Tassel\", \"price\" : 19.85}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 4, \"doc\" : {\"title\" : \"microfiber sheet set\", \"price\" : 19.99}}}"+"\n"+
    "{\"insert\": {\"index\" : \"products\", \"id\" : 5, \"doc\" : {\"title\" : \"CPet Hair Remover Glove\", \"price\" : 7.99}}}"+"\n";         
BulkResponse bulkresult = indexApi.bulk(body);

‹›

Response

Query OK, 3 rows affected (0.01 sec)

Expressions are not currently supported in INSERT and values should be explicitly specified.

Multi-value attributes (MVA) are inserted as arrays of numbers.

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

INSERT INTO products(title, sizes) VALUES('shoes', (40,41,42,43));

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "sizes" : [40, 41, 42, 43]
  }
}

$index->addDocument(
  ['title' => 'shoes', 'sizes' => [40,41,42,43]],
  1
);

indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}})

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","sizes":[40,41,42,43]}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("sizes",new int[]{40,41,42,43});
 }};
newdoc.index("products").id(0L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);

JSON value can be inserted as as an escaped string (via SQL, HTTP, PHP) or as a JSON object (via HTTP).

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

INSERT INTO products VALUES (1, 'shoes', '{"size": 41, "color": "red"}');

JSON value can be inserted as as JSON object

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : {
      "size": 41,
      "color": "red"
    }
  }
}

JSON value can be also inserted as a string containing escaped JSON:

POST /insert
{
  "index":"products",
  "id":1,
  "doc":
  {
    "title" : "shoes",
    "meta" : "{\"size\": 41, \"color\": \"red\"}"
  }
}

$index->addDocument(
  ['title' => 'shoes', 'meta' => '{"size": 41, "color": "red"}'],
  1
);

indexApi = api = manticoresearch.IndexApi(client)
indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}})

res = await indexApi.insert({"index" : "products", "id" : 0, "doc" : {"title" : "Yellow bag","meta":'{"size": 41, "color": "red"}'}});

newdoc = new InsertDocumentRequest();
HashMap<String,Object> doc = new HashMap<String,Object>(){{
    put("title","Yellow bag");
    put("meta", 
        new HashMap<String,Object>(){{
            put("size",41);
            put("color","red");
        }});
 }};
newdoc.index("products").id(0L).setDoc(doc); 
sqlresult = indexApi.insert(newdoc);

HTTP Adding rules to a percolate index

Last modified: February 10, 2021

In a percolate index are stored documents that are percolate query rules and have to follow the exact schema of 4 fields:

field	type	description
id	bigint	PQ rule identifier (if omitted, will be assigned automatically)
query	string	full-text query (can be empty) compatible with the percolate index
filters	string	additional filters by non-full-text fields (can be empty) compatible with the percolate index
tags	string	string with one or many comma-separated tags, which may be used to selectively show/delete saved queries

Any other field names are not supported and will trigger an error.

Warning: inserting/replacing JSON-formatted PQ rules via SQL will not work, i.e. the JSON-specific operators (match etc) will be considered just parts of the rule's text that should match with documents. If you prefer JSON syntax use the HTTP endpoint instead of INSERT/REPLACE.

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

INSERT INTO pq(id, query, filters) VALUES (1, '@title shoes', 'price > 5');
INSERT INTO pq(id, query, tags) VALUES (2, '@title bag', 'Louis Vuitton');
SELECT * FROM pq;

There are two way you can add a percolate query into a percolate index:

query in JSON /search compatible format, described at json/search

PUT /pq/pq_index/doc/1
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}

query in SQL format, described at search query syntax

PUT /pq/pq_index/doc/2
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'index' => 'test_pq',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(newstoredquery)

newstoredquery ={"index" : "test_pq", "id" : 2, "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
indexApi.insert(newstoredquery);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").id(2L).setDoc(doc); 
indexApi.insert(newdoc);

‹›

Response

+------+--------------+---------------+---------+
| id   | query        | tags          | filters |
+------+--------------+---------------+---------+
|    1 | @title shoes |               | price>5 |
|    2 | @title bag   | Louis Vuitton |         |
+------+--------------+---------------+---------+

In case you don't specify the ID it's assigned automatically. Read more about auto-ID here.

‹›

SQL
HTTP
PHP
Python
Javascript
Java

📋

INSERT INTO pq(query, filters) VALUES ('wristband', 'price > 5');
SELECT * FROM pq;

PUT /pq/pq_index/doc
{
"query": {
  "match": {
    "title": "shoes"
  },
  "range": {
    "price": {
      "gt": 5
    }
  }
},
"tags": ["Loius Vuitton"]
}
PUT /pq/pq_index/doc
{
"query": {
  "ql": "@title shoes"
},
"filters": "price > 5",
"tags": ["Loius Vuitton"]
}

$newstoredquery = [
    'index' => 'pq_index',
    'body' => [
        'query' => [
                       'match' => [
                               'title' => 'shoes'
                       ]
               ],
               'range' => [
                       'price' => [
                               'gt' => 5
                       ]
               ]
       ],
    'tags' => ['Loius Vuitton']
];
$client->pq()->doc($newstoredquery);

indexApi = api = manticoresearch.IndexApi(client)
newstoredquery ={"index" : "test_pq",   "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}}
indexApi.insert(store_query)

newstoredquery ={"index" : "test_pq",  "doc" : {"query": {"ql": "@title shoes"},"filters": "price > 5","tags": ["Loius Vuitton"]}};
res =  await indexApi.insert(store_query);

newstoredquery = new HashMap<String,Object>(){{
    put("query",new HashMap<String,Object >(){{
        put("q1","@title shoes");
        put("filters","price>5");
        put("tags",new String[] {"Loius Vuitton"});
    }});
}};
newdoc.index("test_pq").setDoc(doc); 
indexApi.insert(newdoc);

‹›

Response

+---------------------+-----------+------+---------+
| id                  | query     | tags | filters |
+---------------------+-----------+------+---------+
| 1657843905795719192 | wristband |      | price>5 |
+---------------------+-----------+------+---------+

{
  "index": "pq_index",
  "type": "doc",
  "_id": "1657843905795719196",
  "result": "created"
}
{
  "index": "pq_index",
  "type": "doc",
  "_id": "1657843905795719198",
  "result": "created"
}

Array(
       [index] => pq_index
       [type] => doc
       [_id] => 1657843905795719198
       [result] => created
)

{'created': True,
 'found': None,
 'id': 1657843905795719198,
 'index': 'test_pq',
 'result': 'created'}

{"_index":"test_pq","_id":1657843905795719198,"created":true,"result":"created"}

In case of omitted schema in SQL INSERT command the following parameters are expected:

id. You can use 0 as the ID to trigger auto id generation
query - full-text query
tags - pq rule tags string
filters - additional filters by attributes

‹›

SQL

📋

INSERT INTO pq VALUES (0, '@title shoes', '', '');
INSERT INTO pq VALUES (0, '@title shoes', 'Louis Vuitton', '');
SELECT * FROM pq;

‹›

Response

+---------------------+--------------+---------------+---------+
| id                  | query        | tags          | filters |
+---------------------+--------------+---------------+---------+
| 2810855531667783688 | @title shoes |               |         |
| 2810855531667783689 | @title shoes | Louis Vuitton |         |
+---------------------+--------------+---------------+---------+

To replace an existing PQ rule with a new one in SQL just use a regular REPLACE command. There's a special syntax ?refresh=1 to replace a PQ rule defined in JSON mode via HTTP JSON interface.

‹›

SQL
HTTP

📋

mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title shoes |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)
mysql> replace into pq(id,query) values(2810823411335430148,'@title boots');
Query OK, 1 row affected (0.00 sec)
mysql> select * from pq;
+---------------------+--------------+------+---------+
| id                  | query        | tags | filters |
+---------------------+--------------+------+---------+
| 2810823411335430148 | @title boots |      |         |
+---------------------+--------------+------+---------+
1 row in set (0.00 sec)

️Adding documents to a real-time index ️ Adding data from external storages

Last modified: May 11, 2021

HTTP

Configuration

VIP connection

Connecting with cURL

SQL over HTTP

▪️ Adding documents to an index

Adding documents to a real-time index

Auto ID

Bulk adding documents

Inserting multi-value attributes (MVA) values

Inserting JSON

Adding rules to a percolate index

Auto ID provisioning

No schema in SQL

Replacing rules in a PQ index